{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### Lab 9 - Computing probabilities 2\n", "\n", "This lab will use the 311 service request dataset from NYC Open Data you downloaded in Lab 8.\n", "\n", "In this lab, we will look at probabilities involving *and* and *or*, which are computed using the *and* and *or* methods of combining filters from Lab 9. \n", "\n", "As usual, we will import the matplotlib and pandas packages, and set plots to appear in the Jupyter notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, write code to load your 311 data into a dataframe called `calls`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check that the dataframe was created properly by displaying it." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Probabilities using or\n", "\n", "Suppose we want to know what percentage of complaint are about residential or commercial noise. The formula for this probability is:\n", "\n", "$\\text{proability a call is about residential or commercial noise} = \\frac{\\text{# of calls about residential or commerical noise}}{\\text{total # of calls}}$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Residential and commerical noise are listed as separate complaint types, namely `Noise - Residential` and `Noise - Commercial`. \n", "\n", "Can you use what you learned in Lab 9 to find the number of calls about residential or commerical noise?\n", "\n", "One way to do this is:\n", "- create a filter for residential noise\n", "- create a filter for commerical noise\n", "- use the two filters to create a new dataframe of only residential or commerical noise complaints\n", "- count the number of rows in the new dataframe\n", "\n", "First, create a filter for residential noise:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Pattern:\n", " filter_variable = dataframe[\"column_name\"] == \"filter value\"\n", "
\n", "\n", "Now create the filter for commercial noise:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next create a new dataframe of only residential or commercial noise complaints. Do you combine your residential and commerical filters using & (and) or | (or)?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Answer:\n", " noise_calls = calls[res_noise_filter | commerical_noise_filter]\n", "
\n", "\n", "Finally, count how many rows are in your new dataframe and save the number in a variable:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "num_noise_calls = noise_calls.shape[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Hint:\n", " Compute the number of rows of a dataframe called `df` using `len(df)` or `df.shape[0]`.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We've computed the numerator. Now let's compute the denominaotr, which is the total number of 311 calls:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally compute the probability that a call is about residential or commerical noise:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What percentage of calls are about residential or commercial noise? Are you surprised? \n", "\n", "## Probabilities using and\n", "\n", "What about if we wanted to find out the probabilty that a call is a residential noise complaint from the Bronx?\n", "\n", "The formula is:\n", "$\\text{probability a call is a residential noise compliant from the Bronx} = \\frac{\\text{# calls that are residential noise complaints and from the Bronx}}{\\text{total # of calls}}$\n", "\n", "We'll compute the numerator first. We already made a filter to check if a complaint is about residential noise, so we just need a filter to check if the `Borough` column has the value `BRONX` (since the boroughs are stored in all caps). \n", "\n", "Create the Bronx filter" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Pattern:\n", " filter_name = dataframe[\"column_name\"] == \"filter_value\"\n", "
\n", "\n", "This time we want calls to both be about residential noise *and* from the Bronx. Use the two filters to create a new dataframe of only residential noise complaints from the Bronx." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Answer:\n", " bronx_and_res_noise = calls[res_noise_filter & bronx_filter]\n", "
\n", "\n", "Display your new dataframe to make sure it is correct." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Count the number of rows in your new dataframe to count the number of complaints about residential noise in the Bronx and store the value in a variable:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, can you compute the probability that a call is a residential noise complaint from the Bronx?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Answer:\n", " num_bronx_and_res_noise/num_calls\n", "
\n", "\n", "What percentage of 311 calls are residential noise complaints from the Bronx?\n", "\n", "Note that in this case the probability is also the proportion of 311 calls about residential noise complaints in the Bronx. Here the probability and proportion are the same because we are estimating the probability from the data.\n", "\n", "Sometimes it's hard to interpret a single probability. Let's compare the probability that a call from the Bronx is about no heat/hot water with the probability that a call from Manhattan is about no heat/hot water.\n", "\n", "First write down the formula for computing the probability that a call from the Bronx is about no heat/hot water.\n", "\n", "
Answer:\n", "$\\text{probability a call from the Bronx is about no heat/hot water} = \\frac{\\text{# calls from the Bronx about no heat/hot water}}{\\text{# of calls from the Bronx}}$\n", "
\n", "\n", "Next, compute the number of calls from the Bronx and save it in a variable." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, compute the number of calls from the Bronx about no heat/hot water and save it in a variable. These calls will have `BRONX` in the `Borough` column and `HEAT/HOT WATER` in the `Complaint Type` column." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, to compute the probability that a call from the Bronx is about no heat/hot water." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we'll compute the probability that a call from Manhattan is about no heat/hot water. Try to do this below. If you need to add more code cells, you can do it by clicking Insert in the menu." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How does the probability that a call from Manhattan is about no heat/hot water compare to the probability that a call from the Bronx is about no heat/hot water?\n", "\n", "#### Challenges:\n", "- What is the probability that a call is from Brooklyn or Queens?\n", "- What is the probability that the location type is `Street/Sidewalk` and the call is from Staten Island?\n", "- What is the probability that a call about no heat/hot water is from the Bronx? (Note this is different than the proability that a call from the Bronx is about no heat/hot water.)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.3" } }, "nbformat": 4, "nbformat_minor": 2 }